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The general formulation of a linear combination of population means 
permits a wide range of research questions to be tested within the context of 
ANOVA. However, it has been stressed in many research areas that the 
homogeneous variances assumption is frequently violated. To accommodate 
the heterogeneity of variance structure, the Welch-Satterthwaite procedure 
is commonly used as an alternative to the t test for detecting the substantive 
significance of a linear combination of mean effects. This article presents 
two approaches to power and sample size calculations for the Welch- 
Satterthwaite test. The usefulness and diversity of the suggested techniques 
are illustrated with two of the potential applications in meta and moderation 
analyses. The numerical assessments showed that the proposed approaches 
outperform the existing methods on the accuracy of power calculations and 
sample size determinations for meta and moderation studies. Computer 
algorithms are also developed to implement the recommended procedures in 
actual research designs. 


Within the context of analysis of variance (ANOVA), it is often 
desirable to perform comparisons among group means to provide specific 
answers to critical research questions. The general formulation of a linear 
combination of group means permits a wide range of research hypotheses to 
be tested. Accordingly, the designated linear comparison represents the 
substantive hypothesis of interest and reveals essential information that 
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cannot be obtained from the omnibus tests. In one-factor designs, the 
differences between two sets of average group means can be assessed in 
terms of a linear combination of treatment effects. On the other hand, a 
linear combination may be employed to evaluate interactions between 
treatment effects in factorial designs. It follows from the independence, 
normality, and homogeneity of variance assumptions in ANOVA, that the 
inference for a linear combination of mean effects can be conducted with a t 
statistic. Comprehensive guidelines are available in Kutner et al. (2005) and 
Maxwell and Delaney (2004). 

Although the homogeneity of variance formulation provides a 
convenient and useful setup, it is not unusual for the homoscedasticity 
assumption to be violated in actual applications. Specifically, Grissom 
(2000) and Ruscio and Roche (2012) emphasized that the existence of 
heteroscedasticity in clinical and psychological data is more common than 
most researchers realize. Therefore it is prudent to employ suitable 
techniques that are superior to the traditional inferential methods under 
various conditions of unequal variances (Levy, 1978; Tomarken & Serlin, 
1986). For testing a hypothesis of a linear combination of group means, the 
approximation suggested independently by Satterthwaite (1946) and Welch 
(1947) is the most widely recommended technique to correct for variance 
heterogeneity. The procedure is sometimes referred to as the Welch- 
Satterthwaite test and provides a simple and robust /-solution with 
approximate degrees of freedom. Essentially, this problem is a 
generalization of the well-known Behrens-Fisher problem (Kim and Cohen, 
1998) of testing the difference between two population means when 
population variances are heterogeneous. The technique is also useful for 
more complex frameworks such as linear mixed models and generalized 
linear mixed models. 

Despite the advantages for tackling the fairly complicated issue of 
heteroscedasticity, one of the notable issues of the Welch-Satterthwaite 
procedure is the problem of power and sample size calculations. In view of 
the considerable practical value of a linear combination in heteroscedastic 
ANOVA, this article describes two approaches to power and sample size 
calculations for the Welch-Satterthwaite test. One approach adopts a 
noncentral t approximation to the nonnull distribution of the Welch- 
Satterthwaite test. Whereas the other approach considers an exact evaluation 
of the power function of the Welch-Satterthwaite test. The approximate 
distribution presents a particularly attractive and convenient expression. 
Alternatively, the exact formulation is noticeably more effective in 
maintaining the power performance in some situations. Accordingly, the 
presented two power functions can be utilized to calculate the sample sizes 
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needed to attain the specified power level for the chosen model 
configurations. The suggested sample size procedures can be viewed as a 
heteroscedastic generalization of Wahlsten (1991) for the standard 
homogeneous variances framework, and a multi-group extension of Jan and 
Shieh (2011) for the comparison between two population means. 

It is noteworthy that all the tests of main and interaction hypotheses 
relate to linear combinations of group means within the context of an 
ANOVA in primary research are also applicable in meta analysis. A general 
treatment of meta analysis can be found in Hedges and Olkin (1985) and 
Hunter and Schmidt (2004). Moreover, the importance of power 
calculations in meta analysis have been noted in Aguinis et al. (2011), 
Hedges and Pigott (2001), Muncer, Taylor, and Craigie (2002), and 
Valentine, Pigott, and Rothstein (2010). Accordingly, the statistical power 
analysis should be a standard for articles reporting meta analyses (APA 
Publications and Communications Board Working Group on Journal Article 
Reporting Standards, 2008). The most common practice of power 
computation for contrasts among effect sizes assumes the associated 
variances are known (Hedges & Pigott, 2001). However, the variance 
components, just as the mean parameters or effect sizes, are measured with 
errors from independent studies. Similar notion to accommodate the 
variability of sample variances in meta analysis has been recommended by 
Bond, Wiitala, and Richard (2003) and Hartung, Argac, and Makambi 
(2002). Hence it is of theoretical and practical importance to clarify the 
adequacy and discrepancy between the proposed approaches and the 
commonly used technique of Hedges and Pigott (2001). 

Another noticeable utility of testing a linear combination of group 
means is to detect moderating or interactive effects in factorial studies. For 
example, Aguinis (2004), Cohen et al. (2003), and Frazier, Tix, and Barron 
(2004) present practical implications for assessing moderation and 
interaction. However, the tests of hypotheses pertaining to the interaction 
effects often have very low statistical power in applied psychology and 
management research (Aguinis et al., 2005 and the references therein). 
Moreover, Aguinis and Pierce (1998) and Alexander and DeShon (1994) 
reported that the violation of the homogeneity of error variance assumption 
has a detrimental impact on the power for the assessment of moderating 
effects of categorical variables. To remedy the situation, Aguinis, Boik, and 
Pierce (2001) suggested a generalized solution for approximating the power 
to detect interaction effects between categorical moderator variables and 
continuous predictor variables. Alternatively, Guo and Luh (2009) 
presented a sample size method to identify an interaction effect and main 
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effects in a heteroscedastic 2x2 design. For the ultimate aim of selecting 
the best approach, it is sensible to explicate the analytical argument and 
empirical performance of the proposed approaches and the method of Guo 
and Luh (2009). 

In subsequent sections, the suggested exact and approximate 
formulations for the nonnull distribution of the Welch-Satterthwaite test are 
presented. Then the considered power functions are employed to compute 
the power and sample size for detecting a linear combination of population 
means. Monte Carlo simulation studies were conducted to illustrate the 
potential advantages and disadvantages between the proposed and available 
procedures for the meta and moderation analyses. Our study reveals unique 
information that not only demonstrates the fundamental behavior of existing 
methodology, but also enhances the usefulness of the Welch-Satterthwaite 
test in the context of heteroscedastic ANOVA. Moreover, corresponding 
SAS computer codes are presented as appendixes to facilitate the 
recommended procedures in planning ANOVA research. 

Linear Combinations 

Consider the one-way heteroscedastic ANOVA model in which the 
observations Y tj are assumed to be independent and normally distributed 

z 

with expected values p, and variances : 

( 1 ) 


where u, and a f are unknown parameters, i = 1,..., G (> 2) and j = 1,..., N t . 
Thus, G denotes the number of groups and N t is the sample size in the ith 
group. A linear combination of mean parameters is defined as 

G 

V= 2 c iPiv (2) 

r = L 


where c, are the linear coefficients. A contrast is a special case of a linear 

G 

= 0 

combination with the mean coefficients /=l .It follows from the 

A 

model assumption in Equation 1 that a convenient unbiased estimator V for 
the combined effect size ap defined in Equation 2 is of the form 

a 5 ^ 

If = 2 c i?i 
1 = 1 


( 3 ) 
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Yi= f I 

where J _1 is the ith group sample mean and is an unbiased 
estimator of p, for i = 1, G. Moreover, the linear combination estimator 


s given in Equation 3 has the distribution 
V 


(4) 


X = J CjOj/Ni. . 

where i=i Also, an unbiased estimator I of X can be 

2 , > 
obtained by replacing the variance °i in X with its unbiased estimator A 

as follows: 


a c 2 J 

2 = X c&f.<N t 

t = i 


(5) 


« 


5 ; = TO-l) 

where j'=i is the sample variance for i = 1,..., G. For 

detecting a linear combination among the mean effects in terms of the 
hypothesis W>: V = rpo versus Hi: \p = Vo, a useful statistic has the form 


y -yo 


(«) 


where Vo is a constant. Due to the dependence of 1 on the sample 

variances the exact distribution of T* is fairly complicated. 

Notably, the sample variances $i are distributed independently of each 

O Ti ii| 

other and ^ - ~ X Wf - 1) f or / = ] 7 G. It was demonstrated 

in Satterthwaite (1946) and Welch (1947) by the method of equating 
moments that I has the approximate distribution 


£ ~ — y 2 (v). 


(7) 


v = { f cdtNtfii 1 c?o?/[Af0V,- - 1)]>. 


where 1=1 i = l Under the null hypothesis 

H 0 : ip = Vo> it readily follows from Equations 4 and 7 that the quantity T* 
given in Equation 6 has a convenient approximate distribution 

r* ~ 4v), 
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where t(y) is a f distribution with degrees of freedom v. For inferential 

A 

purposes, the term of degrees of freedom v is replaced by its counterpart v 
with direct substitution of ^ 1 ' 0J for in v, where 


v = { 2 1 CiSf/MW- 1)]>. 

p=l i=1 


(8) 


Hence, the Welch-Satterthwaite procedure rejects H 0 at the 
significance level a if n where ^ is the upper 100(a/2) 

percentile of the t distribution f(v)- Accordingly, it can be shown with the 
same theoretical arguments and analytic derivations that the statistic T* has 
the general approximate distribution 

T* ~ t(v, b), (9) 

where t(v, 6) is a noncentral t distribution with degrees of freedom v and 
noncentrality parameter 


6 = 


vo 


,1/2 


It immediately follows from the noncentral t distribution given in 
Equation 9 that the power function of the Welch-Satterthwaite test can be 
approximated by 

? r J <5) = P{|<v i 5)|>/ v ,^}- (10) 

The suggested formulation of re 7 (6) is referred to as the approximate T 
approach for ease of exposition. 

Alternatively, the exact distribution of test statistic T* may be 
expressed in different forms. Note that the linear combination of 

A 

independent sample variances X given in Equation 5 can be rewritten as 

A 

I =KA 

K= 2 IQ ~ y 2 (N T - G), K, ~ y 2 (N) - 1), ,V T = § N u A = 

Where i = 1 * =1 

1 bAi, At = K,/K. and b, = - 1 )}, / = 1 , , G. 

r=i 


The approximate degrees of freedom v given in Equation 8 can also be 
V = (I WV{f bUkNi-l)}. 

expressed as *=i i = i Moreover, it is 

computationally simple and relatively stable to rewrite the dependence of 
{A^ ..., A g } on the chi-square random variables in terms of the beta random 
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variables, see Johnson, Kotz and Balakrishnan (1995, p. 212). Specifically 
A\ = n S,, M = (1 ~ Si) II S,-,Ag~i = (1 - Bo-ijBG-i, 

r=i i=2 and 

A G = 1 ~ B c _i where 3i = { | £}}/{ Va}} 

^ -l j =- has a beta distribution 

Bi ~ Beta{ 1 (Nj- l)/2, (M+i - l)/2} for i = 1, .... G - 1 

J = 1 An important 

underlying property of the suggested formulations is that the random 

variables B u B G _ ] and K are mutually independent. Hence, both v and 
A can be viewed as a function of beta random variables {B l ,..., B G _ j}, and 
they are independent of K. 

With these definitions of transformed variables, the following 
formulation of T* is considered: 

r*=-^r, (11) 

where T = Z/{K/(N T - g)} m ~ t(N T - G, 6), and V = (N T — G)A/2. Overall, 
the random variables Z, K and {B u B G _ ] } are mutually independent. 
Hence, T and V are independent. With the alternative expression of T* 
given in Equation 11, the exact power function of the Welch-Satterthwaite 
test can be formulated as 

= E b {P{\KNj - G, 6)1 > (12) 

where the expectation E B {} is taken with respect to the joint distribution of 
{B u ..., B g _ i}. Since all related functions are readily embedded in major 
statistical packages, Monte Carlo integration provides a feasible approach to 
perform the required assessment of Jt £ (6), especially when the number of 
groups is large. 

To determine sample sizes in planning research designs, the power 
functions can be employed to calculate the sample sizes {/V,, i = 1, ..., G} 
needed to attain the specified power 1 - p for the chosen significance level 

a, null value xp 0 , mean coefficients {c„ i = 1, ..., G} and parameter values 

2 

{Hi, Of, i = 1,..., <j}. it usually involves an iterative process to find the 
solution. As there may be several possible sets of sample sizes that satisfy 
the chosen power level, it is constructive to consider an appropriate design 
with a priori designated sample size ratios (r,, ..., r G } with r, = N/N 1 , for i = 
1,..., G. Thus the process is confined to deciding the minimum sample size 
N\ (with N t = N\r t , i = 2, ..., G) required to achieve the selected power level 
with the power functions in Equations 10 and 12, respectively. In order to 
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explicate the applicability of power and sample size methodology for the 
Welch-Satterthwaite procedure, in subsequent sections this study considers 
design configurations under the contexts of meta and moderation analyses. 

Power calculations in meta analysis 

The most appealing reason for conducting a meta analysis is that a 
collection of related studies have higher statistical power than any single 
one of those studies (Hunter & Schmidt, 2004). However, as with 
prospective power analysis in a primary study, the statistical power in a 
meta-analysis depends on the joint impact of responsible factors including 
the population effect size, the associated variance component. Type I error 
rate, and sample size. Without a detailed appraisal, the actual power of the 
collection of studies may still not be high enough to detect the effect size of 
importance and to support the research question of interest. It is essential to 
note that the comparison of mean effect sizes between two sets of studies in 
meta analysis corresponds to testing a linear combination of mean effects in 
a one-way heteroscedastic ANOVA. To demonstrate the contrasting 
behavior of the alternative power functions of the Welch-Satterthwaite test 
in the context of meta analysis, a numerical investigation was conducted in 
two stages. The first stage presented power calculations for Hedges and 
Pigott’s (2001, Equation 31) method and the proposed exact and 
approximate approaches described in Equations 10 and 12, respectively, 
under several model configurations. Then, a Monte Carlo simulation was 
performed to explicate the accuracy of the competing procedures under the 
design characteristics specified in the first step. 

To reveal the potential extent of characteristics that an applied work 
may reflect in meta studies, the examined frameworks consist of the 
principle factors of sample sizes, variance components and linear 
coefficients. Following the model formulation of Hedge and Pigott (2001, 
Equation 1), the numbers of studies are set as G = 4 and 12 with an average 
study size N T IG =10. The sample size patterns are deliberately varied with 
variance components to have three different characteristics: balanced, 
direct-pairing and inverse-pairing structures. For the case of G = 4, the 
heterogeneous variances are chosen as {1,4, 9, 16}. Accordingly, the three 
study size designs {/V,, i = 1,..., 4} for a total N r = 40 are 

Balanced design: {10,10,10,10}; 

Direct-pairing design: {4, 8,12,16}; 

Inverse-pairing design: {16,12, 8,4}. 
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Moreover, the three sample size schemes are combined with three different 
sets of linear coefficients: (c,, c 2 , c 3 , c 4 } = {1, -1/3, -1/3, -1/3}, {1/3, 1/3, 
1/3, -1} and {1/2,1/2, -1/2, -1/2}. 

On the other hand, the prescribed configurations for G = 4 are 
extended to G = 12 by replicating each element three times. Specifically, 
the variance components are {1, 1, 1, 4, 4, 4, 9, 9, 9, 16, 16, 16} and the 
corresponding sample size designs {N n i = 1,..., 12} are 

Balanced design: {10,10,10,10,10,10,10,10,10,10,10,10}; 

Direct-pairing design: {4,4,4, 8, 8, 8,12,12,12,16,16,16}; 

Inverse-pairing design: {16,16, 16,12, 12, 12, 8, 8, 8,4,4,4}. 

Although these sample sizes may be smaller than would be likely in many 
ANOVA studies, it is plausible that if problems or deficiencies were to be 
seen with power calculations, they would be most apparent with small study 
sizes. In this case, the three settings of linear coefficients {c„ i = 1, ..., 12} 
are denoted by 

LC1: {1/3,1/3,1/3,-1/9,-1/9,-1/9,-1/9,-1/9,-1/9,-1/9,-1/9,-1/9}; 

LC2: {1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9,-1/3,-1/3,-1/3}; 

LC3: {1/6,1/6,1/6,1/6,1/6,1/6,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6}. 

Without loss of generality, the mean effects are set as p, = p and p, = 0 for i 
= 2 to G, where p is properly selected such that the resulting power level of 
the approximate T procedure is near 0.90. Specifically, the selected values 
of p for the nine settings in Table 1 are 2.18, 14.21,5.87, 2.53, 11.05, 5.27, 
3.15, 29.42, and 9.38. The corresponding mean values in Table 2 are 3.69, 
23.02, 9.87, 4.10, 18.50, 8.96, 4.84, 38.34, and 14.03. Throughout this 
empirical study, the significance level is set as a = 0.05 and the null value is 
xp 0 = 0. Overall these considerations result in a total of 18 different model 
configurations. 

With these specifications, the attained power level for the designated 
power function can be readily computed. In addition, a simple expression 
has been described in Hedges and Pigott (2001) for the nonnull distribution 
of the Welch-Satterthwaite statistic T*. Specifically, under the assumption 
that the variances are known. Hedges and Pigott (2001, Equation 31) 
suggested an approximate Z formulation 

nz(6) = P{[M;a, i)| >a^}, (13) 

where N(b, 1) is a normal distribution with mean 6 and variance 1, and z afl 
is the upper 100(a/2) percentile of the standard normal distribution. The 
performance of power functions jt £ (6), jc 7 (6) and Jt z (6) are examined for 
each of the 18 combined settings of 2 numbers of studies, 3 study size 
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structures, and 3 linear coefficient sets. The computed powers for the 
selected model configurations are listed in Tables 1 and 2 for G = 4 and 12, 
respectively. An inspection of the power calculations reported in Tables 1 
and 2 shows that a consistent order among the achieved power levels: jx £ (6) 
< Jt r (6) < Tt^b) for all the cases considered here. The power outcome given 
by the approximate Z formula may be the largest; however, this does not 
imply that it is the best method. The accuracy of the alternative formulas is 
further evaluated through the following Monte Carlo simulation. 

In the second step, estimates of the true power associated with the 
given model configurations for all three procedures were computed via 
Monte Carlo simulations of 10,000 independent data sets. For each 
replicate, {/V,, ..., N G } normal outcomes are generated with the designated 
configurations for the one-way heteroscedastic ANOVA model. Next, the 
test statistic T* is computed and the simulated power is the proportion of the 
10,000 replicates whose test statistics T* exceed the corresponding critical 

value 1^1 > k. 0.025- Adequacy of the examined procedure for power 
calculation is determined by the error between the simulated power and 
computed power presented above. The simulated power and errors are also 
summarized in Tables 1 and 2. 

According to the extensive numerical results, the approximate Z 
method of Hedges and Pigott (2001) is not consistently accurate because 
only 7 out of 18 cases have absolute error less than or equal to 0.02. The 
differences of the remaining 11 cases are substantial and unsatisfactory, 
especially for the circumstances under inverse pairing of variance 
heterogeneity and sample sizes. Specifically, the results associated with the 
inverse-pairing condition incur the sizeable errors of -0.0919, -0.1008 and 
-0.1003 for the three linear combinations in Table 1. Also, the 
corresponding errors are -0.0402, -0.0765 and -0.0482 for the three 
comparisons presented in Table 2. Thus, the absence to incorporate 
uncertainty associated with variance estimation is a disadvantage of the 
approximate Z power function proposed in Hedges and Pigott (2001). 

On the other hand, the computed powers of the noncentral t function 
jt r (6) appear to maintain a reasonable range near the simulated outcomes. 
For the balanced and direct-pairing designs, the approximate T method 
generally gives reliable results with absolute errors mostly less than 0.01. 
The only exception is the error -0.0144 associated with the first comparison 
of the direct-pairing scheme in Table 1. However, the performance is less 
satisfactory in the inverse-pairing situations. When the number of studies is 
G = 4, the induced errors for the first and third linear combinations 
of the inverse-pairing settings are -0.0222 and -0.0219, respectively. 
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Table 1. Simulated power and computed power for the test of linear 

2 2 2 11 

combination H 0 : i|) = 0 versus H,: t|) * 0 with a = 0.05 and ° 2 ’ ° 3 ' 

= (1,4,9,16) 
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Table 2. Simulated power and computed power for the test of linear 
combination H 0 : = 0 versus H,: ij) * 0 with a = 0.05 and variance 

components: {1,1,1,4,4,4,9,9,9,16,16,16} 
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Under the extended inverse-pairing consideration of G = 12, the second 
comparison yields the largest error -0.0256 of all 18 cases. Based on the 
numerical evidence, the approximate T approach is slightly vulnerable 
under the circumstance that the sample sizes are inversely paired with 
heterogeneous variances. Conversely, the exact approach performs 
extremely well because all absolute errors are less than or equal to 0.0043 
for the 18 cases examined here. In addition to the reported assessments with 
the nominal power 0.90, the accuracy was further justified for the same 
model configurations with a smaller target power 0.80. To conserve space, 
the details are not given here. Thus the methodology of exact power 
calculation is of great potential use. Although it is more computationally 
intensive than the approximate T approach, it is of little consequence if a 
computer is employed. 

Sample size calculations in moderation analysis 

It is an important problem in moderation research to clarify the impact 
of a moderator on the direction and/or strength of the relationship between a 
predictor and a criterion variable (Baron & Kenny, 1986). Accordingly, the 
simplest situation of the moderation analysis is to measure a dichotomous 
independent variable’s effect on the dependent variable varies as a function 
of another dichotomy. The particular moderation phenomenon is 
conceptually equivalent to the interaction effect in a 2 x 2 factorial design. 
Two of the vital factors known to affect power are the sample size and error 
variance heterogeneity. Hence there is a need to understand the inherent 
relationship that exists between the desired power performance and the 
necessary sample size conditional on the heteroscedastic model structure. 

For ease of explication, the statistical model of a 2 x 2 heteroscedastic 
ANOVA design is written as: 

2 

XsA ~ N (Paft ojf), 

where X stl represents the independent and normally distributed response 

2 

variable with expected values p s , and variances u s( is the population 
2 

mean, and is the error variance at level s of A and level t of B for s and t 
= 1 and 2, and l = 1, ..., M st . Accordingly, the interaction or moderation 
effect size between the two factors A and B can be expressed as 

% = dll - Pl2 - Ll21 + U22- (14) 


The linear contrast 

— JTn — X \2 — X21 T X22 


( 15 ) 


380 


G. Shieh & S.L. Jan 


is an unbiased estimator of xp 7 where j=t for s 

and £ = 1 and 2. It is easily seen that there exists a close resemblance 

A ^ 

UlT V 

between the linear formulations and statistical properties of and 
defined in Equations 15 and 3, respectively. Hence, the techniques for 
obtaining power and sample size for the test of rp can immediately be 
applied to compute power and sample size for the test of xp,. A detailed 
account of the related methodology is presented next to document their 
distinct characteristics in terms of theoretical principles and computational 
requirements. 


The hypothesis testing of H 0 : ip, = rp /0 versus H,: rp / ^ xp /0 can be 
conducted with the following statistic 

A 


* wi-Hm 

* J ~ a m 


(16) 


2 2 i 

I;= 5. l&M st 

where rp /0 is a specified constant, j = ir=i is the typical estimator 

v — k.j'A \ _ i S and = 5 — A : t) >(M sl — 1) 

I = i r =i ; = i i s the 

2 

sample variance estimator of a si for s and t = 1 and 2. The test procedure 

|Jj ^ /;* 


rejects H 0 at the significance level a if 


v, t u.' 2 where 


v/= {1 tSAMJMx- 1)]>- 

.r=i/= 1 i=ir=i 


(17) 


It follows from Equation 10 that the corresponding power can be 
approximated by 

*n(?>i) = P{\KviAi)\>t v ,,^}, (IS) 

where 


bj-(V/-Wj)/2|- and Vf = { 1 io£/AQ 2 /{ 1 | - 1)]}. 

i = t=i ir = l 

Also, the exact power function in Equation 12 is modified as 

-1 6j)| > ( 19 ) 

M t = 1 \ Mst, 

where s=lr=l and V, is the counterpart of V defined in Equation 

11. In this case, the expectation E B {} is taken with respect to the joint 
distribution of {B u B 2 , B 3 }. 
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In contrast to the proposed formulations, Guo and Luh (2009, pp. 420- 
421) exploited Welch’s (1938) two-sample statistic for the Behrens-Fisher 
problem to obtain a distinct method for the detection of an interaction effect 
H„: iD, = ri),n versus H,: id, * xp /0 . With the definitions of 


(£ii + .S‘ :2 ) and 5^ (5 i2 + S 2 {) , Quo and Lyh’s 120091 test procedure 

I Xj\ > f* T, 

rejects H 0 : x\), = i|> /0 at the significance level a if 1 * V|1L - DL - where 

v® = (S&M. + - 1) + ( S^M b f/(M b - 1)]. 


2 u 2 2 

For notational simplicity, let a a = ( a n + °— ) a b = ( a l 2 + a 2l) » 

M a = M n + M 22 , and M b = M n + M 21 . In general, they suggested the 

* 

approximate noncentral t distribution for 7}: 

Tj ~ tfycL, 6 qi ), (20) 


where Vet = + - 1) + - 1)], 

1/2 2 2 

- (Vi - anc j Z gl = oJM a + OfJMf,. Hence, the corresponding 

power function is of the form 


■Grf *Gd = P{\t{vGL, 8(2)1 > (21) 

To determine sample sizes for testing an interaction effect within the 
context of heteroscedastic ANOVA, the power functions it TI (bj), 3t £/ (8 7 ), and 
TC GL (b GL ) defined in Equations 18, 19 and 21, respectively, can be employed 
to calculate the sample sizes (M n , M n , M 2l , M 22 ) needed to attain the 
specified power 1 - (3 for the chosen significance level a, null effect xp /0 , 

mean effects {u M , p 12 , p 2l , p 22 }, error variances i a u> i:f 12, a 2b °22/ and 
designated sample size ratios (r u , r 12 , r 21 , r 22 } where r st = MJM n for 5 and t 
= 1 and 2. 


To reveal the underlying robustness and deficiency of the contending 
techniques, numerical assessments were carried out for the model settings in 
Guo and Luh (2009) in which the mean values and error variances are (p,,, 

Fi 2 , M- 21 . F 22 } = {71.3, 93.9, 77.1, 93.3} and { a ii- °12> ° 2 i> <* 22 } = (146.41, 
129.96, 207.36, 153.76}, respectively. Moreover, seven patterns of sample 
size ratios were used to assess power and sample size calculations: (r u , r 12 , 
r 2l ,r 22 } = { 1,1,1,1}, (1,1,2, 2}, (1,2,1,2}, (2,1,2,1}, (2, 2, 1,1,}, (2, 
1, 4, 3}, and (3, 4, 1, 2}. Essentially, these designated allocation schemes 
produce a wide variety of balanced, mixed-pairing, direct-pairing, and 
inverse-pairing settings with the heterogeneous variances and thus cover a 
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broader range of situations than the single case considered by Guo and Luh 
(2009). 

With these specifications, the required sample sizes were computed 
for the abovementioned three approaches with the chosen power value (1 - 
(3) = 0.80, significance level a = 0.05 and null value ip /0 = 0. Accordingly, 
the results associated with the exact and approximate T approaches are 
essentially identical. Therefore, only the empirical outcomes of the exact 
approach and Guo and Luh’s method are presented in Table 3. The actual 
powers or attained powers associated with the required sample sizes are 
computed with the power functions Jt EI (bj) and Jt Gi (6 GL ). Similar to the 
empirical study for the meta analysis, simulated powers were obtained by a 
Monte Carlo simulation study and the results are also presented in Table 3. 

A numerical analysis was also conducted for the same model configurations 

, 2 2 1 2 , 

with the modified heteroscedastic magnitudes i°ii’ °12» a 2i’ a 2 2 ) = 
{146.41, 129.96, 207.36, 153.76}/9 ={16.27, 14.44, 23.04, 17.08}. The 
corresponding results are presented in Table 4. 

It follows from the comprehensive results in Tables 3-4 that the 
necessary sample sizes for Guo and Luh’s (2009) method are equal to or 
slightly smaller than those of the exact approach. However, the powers 
given by the power function Jt Gi (6 GL ) seems to be markedly larger than the 
nominal value 0.80 with two exceptions in the balanced sample size 
designs. In addition, the errors of their procedure are sizable, especially for 
the two cases of -0.1199 and -0.1296 associated with the inverse pairing 
between sample size and error variance in Tables 3 and 4, respectively. 
Also, it can be shown that Z GL = 2 / and b GL = bj when r st = MJM n = oJo n 
for s and t = 1 and 2. Due to the dominant role of noncentrality in the power 
function, the power function jt g/ ( 6 g/ ) will give the proper value only when 
the specific condition is satisfied. On the other hand, the behavior of the 
exact approach appears to be excellent for the range of model specifications 
considered here. In particular, the incurred errors of the 14 cases are all 
within the small range of -0.0096 to 0.0091. Hence the proposed procedure 
possesses the advantage of general applicability and good accuracy without 
any imposed restriction to the model configurations. In short, these analytic 
clarification and numerical evidence show that the suggested approach 
outperforms Guo and Luh’s (2009) method in power calculations and 
sample size determinations for the Welch-Satterthwaite test of interaction 
effect within the 2x2 heteroscedastic ANOVA framework. 
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Table 3. Computed sample size, computed power, and simulated power 
for the test of interaction effect H 0 : t|> 7 = 0 versus H,: * 0 with a = 

M-21? M 22 } = 


0.05, nominal power (1 - P) = 0.80, mean effects {u,,. u v . 

{71.3, 93.9, 77.1, 93.3}, and error variances " a " L ' 1 
129.96,207.36,153.76} 


= {146.41, 
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Table 4. Computed sample size, computed power, and simulated power 
for the test of interaction effect H 0 : op, = 0 versus H,: * 0 with a = 

0.05, nominal power (1 - |5) = 0.80, mean effects lu... u. r . u 21 , p 22 } = 

{71.3, 93.9, 77.1, 93.3), and error variances ' a ~ U ' ° ,2= 011 ’ = {16.27, 

14.44,23.04,17.08} 
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DISCUSSION AND CONCLUSIONS 

In view of its practical value in the context of heteroscedastic 
ANOVA designs, this article presents two approaches to power and sample 
calculations for the Welch-Satterthwaite test of linear combinations of 
group means. The approximate method provides a transparent formulation 
and relies on a noncentral t distribution, whereas the exact procedure is of 
theoretical importance and involves a Beta mixture of noncentral t 
distributions. It can be justified that the approaches are asymptotically 
equivalent as sample size goes to infinity. However, their finite-sample 
properties can be substantially different and the respective power functions 
may yield markedly different results for relative small samples and certain 
model settings. It is shown here that while computation is slightly involved 
when using the exact procedure, the extra complexity is outweighed by its 
superiority in accuracy. 

It is vital to ensure that the underlying properties of the power and 
sample size procedure are well understood so that a well-supported and 
useful recommendation can be offered for empirical studies. The extensive 
usefulness and great diversity of the suggested power and sample size 
procedures are illustrated with two applications in meta and moderation 
analyses. Detailed analytic explication and numerical assessment are 
presented to demonstrate the prominent advantage of the proposed 
procedures and the potential deficiency of existing methods. In particular, 
the failure to accommodate the stochastic nature of error variances and the 
absence to embed the diverse structure of sample sizes are restrictions of the 
current methods of Hedges and Pigott (2001) and Guo and Luh (2009) for 
meta analysis and moderation analysis, respectively. Consequently, the 
suggested power and sample size procedures update and expand upon 
current work in the literature and the developed computer programs can 
facilitate the application of the suggested algorithms. 

This study focuses on the appropriate procedure for testing the linear 
combination of group means of independent normal distributions with 
possibly unequal error variances. Moreover, according to the findings of 
Vallejo, Ato, and Fernandez (2010), the Welch-James procedure is robust to 
departures from normality assumption when the distribution type is 
symmetric with moderate degree of kurtosis. Therefore, the related Welch- 
Satterthwaite test procedure is still of practical interest and usefulness. On 
the other hand, the established class of generalized linear models offers an 
excellent alternative for analyzing data when the normality and 
homogeneity assumptions are not tenable. Related details and follow-up 
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procedures can be found in McCullagh and Nelder (1989) and McCulloch, 
Searle, and Neuhaus (2008). 
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APPENDIX A 

SAS IML program for computing the attained power for Welch- 
Satterth waite’s test 


PROC IML;PRINT "POWER CALCULATIONS"; 


*USER SPECIFICATIONS; 
*TYPE I ERROR; 

*GROUP MEANS; 

*GROUP VARIANCES; 

*GROUP SAMPLE SIZES; 

*COEFFICIENTS; 

3}; 

*END OF SPECIFICATIONS; 


ALPHA=0.05; 

MUVEC={2.18 0 0 0}; 
VARVEC={1 4 9 16}; 

NVEC={10 10 10 10}; 
CVEC={1 -1 -1 -1}/{1 3 3 


G=NCOL(VARVEC);NT=SUM(NVEC);PSI=CVEC*MUVEC'; 
VARPSI=(CVEC##2)*(VARVEC/NVEC)'; 

DELTA=PSI/SQRT(VARPSI);DF=NT-G;DFVEC=NVEC-1; 

*APPRO METHOD; 

KV=(CVEC##2)#VARVE C/NVE C;V1=SUM(KV)##2; 

V2=SUM((KV##2)/(NVEC-1));DFAP=V1/V2;CRIT=TINV(1- 
ALPHA/2,DFAP); 

APP=CDF('T',- 

CRIT,DFAP,DELTA)+SDF('T',CRIT,DFAP,DELTA); 

PRINT 'APPROXIMATE T: POWER' APP[FORMAT=8.4]; 
*EXACT METHOD; 

SEED=1001;CALL STREAMINIT(SEED);REPN=10000; 
DF1=CUSUM(DFVEC[1,1:G—1]);DF2=DFVEC[1,2:G];EP=0; 

DO 1 = 1 TO REPN;BVEC=RAND('BETA',DF1'/2 , DF2 '/ 2 ) ; 
AVEC=J(G,1,0);AVEC[1,1]=EXP(SUM(LOG(BVEC))); 

DO IG=2 TO G-l; 

AVEC[IG,1]=(1-BVEC[IG-1,1])#EXP(SUM(LOG(BVEC[IG:G- 
1,1])));END; 

AVEC[G,1]=1-BVEC[G-l,1]; 

LBVEC=(CVEC##2)#VARVEC/(NVEC#DFVEC); 

DFV=(LBVEC*AVEC)##2/(((LBVEC##2)/DFVEC)*(AVEC##2)) 

r 

CRIT=TINV(ALPHAU,DFV);H=DF#(LBVEC*AVEC)/VARPSI; 
EP=EP+SDF(’T’,CRIT#SQRT(H),DF,DELTA)+CDF(’T’,- 
CRIT#SQRT(H),DF, 

DELTA) ; 
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END;EXP=EP/REPN;PRINT 'EXACT METHOD: POWER' 
EXP[FORMAT=8. 4] ; 

QUIT; 


APPENDIX B 

SAS IML program for computing the required sample size for Welch' 
Satterth waite’s test 


PROC IML;PRINT "SAMPLE SIZE CALCULATIONS"; 


*USER SPECIFICATIONS; 
*TYPE I ERROR; 

^NOMINAL POWER; 

*GROUP MEANS; 

93.3}; 

*GROUP VARIANCES; 

207.36 153.76}; 

*SAMPLE SIZE RATIOS; 

*COEFFICIENTS; 

*END OF SPECIFICATIONS; 


ALPHA=0.05; 

POWER=0.80; 

MUVEC={71.3 93.9 77.1 

VARVEC={146.41 129.96 

RVEC={1 1 1 1}; 

CVEC={1 -1 -1 1}; 


PSI=CVEC*MUVEC';G=NCOL(VARVEC); 

*APPRO METHOD; 

N=3;DO UNTIL 

(E PAP>POWER) ; N=N+1; NVEC=N#RVEC; DFVEC=NVEC-1 ; 
VARPSI=(CVEC##2)*(VARVEC/NVEC)';KV=(CVEC##2)#VARVE 
C/NVEC; 

V1=SUM(KV)##2;V2=SUM((KV##2)/DFVEC);DFAP=V1/V2; 
CRIT=TINV(1-ALPHA/2,DFAP);DELTA=PSI/SQRT(VARPSI); 
EPAP=CDF(’T’,- 

CRIT,DFAP,DELTA)+SDF(’T’,CRIT,DFAP,DELTA);END; 
PRINT 'APPROXIMATE T: POWER & N' EPAP[FORMAT=8.4] 
NVEC[FORMAT=4.0]; 

*EXACT METHOD; 

SEED=1001;CALL STREAMINIT(SEED);REPN=10000; 

N=MAX(NVEC[1,RVEC[1,>:<] ]-5, 3);DO UNTIL 
(EPEX>POWER); 

N=N+1;NVEC=N#RVEC;NT=SUM(NVEC);DFVEC=NVEC-1;DF=NT- 

G; 

VARPSI=(CVEC##2)*(VARVEC/NVEC)';DELTA=PSI/SQRT(VAR 
PSI) ; 
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DF1=CUSUM(DFVEC[1,1:G-1]);DF2=DFVEC[1,2:G];EP=0; 

DO 1=1 TO 

REPN;BVEC=RAND('BETA',DF1'/2,DF2' / 2 ) ;AVEC=J(G,1,0) 

r 

AVEC[1,1]=EXP(SUM(LOG(BVEC)));DO IG=2 TO G-l; 

AVEC[IG,1]=(1-BVEC[IG-1,1])#EXP(SUM(LOG(BVEC[IG:G- 

1.1] )));END; 

AVEC[G,1]=1-BVEC[G- 

1.1] ;LBVEC=(CVEC##2)#VARVEC/(NVEC#DFVEC); 

DFV=(LBVEC*AVEC)##2/(((LBVEC##2)/DFVEC)*(AVEC##2)) 

r 

CRIT=TINV(1-ALPHA/2,DFV);H=DF#(LBVEC*AVEC)/VARPSI; 
EP=EP+CDF('T',- 

CRIT#SQRT(H),DF,DELTA)+SDF('T',CRIT#SQRT(H),DF, 
DELTA);END;EPEX=EP/REPN;END; 

PRINT 'EXACT METHOD: POWER & N' EPAP[FORMAT=8.4] 
NVEC[FORMAT=4.0]; 

QUIT; 
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